Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support DuckDB struct syntax and support list of struct syntax #1372

Merged
merged 14 commits into from
Aug 15, 2024

Conversation

jayzhan211
Copy link
Contributor

Close #1245

Signed-off-by: jayzhan211 <[email protected]>
@coveralls
Copy link

coveralls commented Aug 9, 2024

Pull Request Test Coverage Report for Build 10404149988

Details

  • 114 of 118 (96.61%) changed or added relevant lines in 4 files are covered.
  • 5 unchanged lines in 2 files lost coverage.
  • Overall coverage increased (+0.02%) to 89.143%

Changes Missing Coverage Covered Lines Changed/Added Lines %
tests/sqlparser_bigquery.rs 33 34 97.06%
tests/sqlparser_duckdb.rs 59 60 98.33%
src/ast/data_type.rs 4 6 66.67%
Files with Coverage Reduction New Missed Lines %
src/ast/mod.rs 1 81.47%
tests/sqlparser_bigquery.rs 4 90.27%
Totals Coverage Status
Change from base Build 10390800057: 0.02%
Covered Lines: 28260
Relevant Lines: 31702

💛 - Coveralls

Signed-off-by: jayzhan211 <[email protected]>
@@ -1074,10 +1074,14 @@ impl<'a> Parser<'a> {
Keyword::MATCH if dialect_of!(self is MySqlDialect | GenericDialect) => {
self.parse_match_against()
}
Keyword::STRUCT if dialect_of!(self is BigQueryDialect | GenericDialect) => {
Keyword::STRUCT if dialect_of!(self is BigQueryDialect | GenericDialect ) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you revert this line?

self.prev_token();
self.parse_bigquery_struct_literal()
}
Keyword::STRUCT if dialect_of!(self is DuckDbDialect ) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Keyword::STRUCT if dialect_of!(self is DuckDbDialect ) => {
Keyword::STRUCT if dialect_of!(self is DuckDbDialect) => {

Comment on lines 2478 to 2484
match self.peek_token().token {
Token::RParen => {
self.next_token();
false.into()
}
_ => return self.expected(")", self.peek_token()),
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
match self.peek_token().token {
Token::RParen => {
self.next_token();
false.into()
}
_ => return self.expected(")", self.peek_token()),
}
self.expect_token(&Token::RParen)?;
false.into()

@@ -7229,10 +7274,18 @@ impl<'a> Parser<'a> {
))))
}
}
Keyword::STRUCT if dialect_of!(self is DuckDbDialect) => {
self.prev_token();
// let field_defs=
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// let field_defs=

tests/sqlparser_duckdb.rs Show resolved Hide resolved
@git-hulk
Copy link
Member

git-hulk commented Aug 9, 2024

cc @iffyio @alamb

Signed-off-by: jayzhan211 <[email protected]>
@jayzhan211 jayzhan211 changed the title Support DuckDB struct symtax Support DuckDB struct syntax and support list of struct syntax Aug 9, 2024
Copy link
Member

@git-hulk git-hulk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, great.

@jayzhan211
Copy link
Contributor Author

Thanks for your review @git-hulk

@@ -2227,7 +2228,15 @@ impl<'a> Parser<'a> {

Ok((
field_defs,
self.expect_closing_angle_bracket(trailing_bracket)?,
if token == Token::Lt {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a slight mismatch I'm thinking might not be worth having - the open delimeter token is configurable now but here the function still needs to figure out for itself what the value is for the closing token.
The function is a bit complicated on its own, specifically due to the < delimeter. then since duckdb syntax doesn't have that I'm thinking it makes sense to use a dedicated function that is simpler.

Something roughly like this I think should be all duckdb need, would it make sense to add this instead and have duckdb (and any future dialect sharing the same syntax) use it?

   fn parse_duck_db_struct_type_def<F>(
        &mut self,
    ) -> Result<Vec<StructField>, ParserError>
    {
        let start_token = self.peek_token();
        self.expect_keyword(Keyword::STRUCT)?;

        self.expect_token(Token::LParen)?;

        let field_defs = self.parse_comma_separated(Self::parse_struct_field_def)?;

        self.expect_token(Token::RParen)?;

        Ok(field_defs)
    }

This comment was marked as outdated.

Comment on lines 24 to 32
#[test]
fn test_struct() {
// nested struct
let canonical = r#"CREATE TABLE t1 (s STRUCT<v VARCHAR, s STRUCT<a1 INTEGER, a2 VARCHAR>>[])"#;
let sql = r#"CREATE TABLE t1 (s STRUCT<v VARCHAR, s STRUCT<a1 INTEGER, a2 VARCHAR>>[])"#;
let select = bigquery().parse_sql_statements(sql).unwrap().pop().unwrap();
// TODO: '>>' is incorrect parsed in bigquery syntax
assert_ne!(select.to_string(), canonical);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure I follow if/why this test is needed, since bigquery doesn't support [] syntax for arrays?

let field_defs = self
.parse_comma_separated(Self::parse_struct_field_def)?
.into_iter()
.map(|(f, _)| f)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh this looks like a bug for us to silently drop the returned flag since there might be an extraneous >, seems like we would instead do something like this

let field_defs = self.parse_comma_separated(|self| {
    Ok(StructField {
        field_name: Some(self.parse_identifier(false)?), // I assume field name isn't optional for duckdb?
        field_type: self.parse_data_type(),
    })

src/parser/mod.rs Show resolved Hide resolved
Comment on lines 38 to 39
let canonical = r#"CREATE TABLE t1 (s STRUCT<v VARCHAR, i INTEGER>)"#;
let sql = r#"CREATE TABLE t1 (s STRUCT(v VARCHAR, i INTEGER))"#;
Copy link
Contributor

@iffyio iffyio Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I think this is incorrect, no need for the input and output to not match, in this case the output is significantly different syntax from what went into the parser. We can probably add a parens_delimeter: bool or similar flag to the StructField struct so that it knows how to display accordingly?

Copy link
Contributor Author

@jayzhan211 jayzhan211 Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear to me. Do you mean adding flag so we have different canonical name for struct<> and struct()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah that's what I meant - my bad realised I wrote comma_delimeter but meant to write parens_delimeter instead. Yeah so like if that flag is true display uses struct (...) otherwise struct<...>

@alamb
Copy link
Contributor

alamb commented Aug 14, 2024

I am hoping to make a sqlparser release soon (like tomorrow) -- do we think this PR is close to ready (aka should I hold the release for this PR?)

Signed-off-by: jayzhan211 <[email protected]>
@jayzhan211
Copy link
Contributor Author

I am hoping to make a sqlparser release soon (like tomorrow) -- do we think this PR is close to ready (aka should I hold the release for this PR?)

It is close. We can release together with this PR

Signed-off-by: jayzhan211 <[email protected]>
@jayzhan211 jayzhan211 requested a review from iffyio August 15, 2024 00:46
src/parser/mod.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@iffyio iffyio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor cleanup comment, otherwise LGTM! cc @alamb

tests/sqlparser_duckdb.rs Outdated Show resolved Hide resolved
@alamb
Copy link
Contributor

alamb commented Aug 15, 2024

I am working on updating the tests and adding some small comments. I'll be done shortly

@@ -32,6 +32,118 @@ fn duckdb_and_generic() -> TestedDialects {
}
}

#[test]
fn test_struct() {
// s STRUCT(v VARCHAR, i INTEGER)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the tests to also have coverage of the actual parsed data types

@@ -618,6 +625,17 @@ fn format_clickhouse_datetime_precision_and_timezone(
Ok(())
}

/// Type of brackets used for `STRUCT` literals.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also added some comments on this struct

@alamb
Copy link
Contributor

alamb commented Aug 15, 2024

Thanks again @jayzhan211 and @iffyio and @git-hulk

@alamb alamb merged commit 8c4d30b into apache:main Aug 15, 2024
10 checks passed
@jayzhan211
Copy link
Contributor Author

Great! Thanks @iffyio @alamb and @git-hulk

@jayzhan211 jayzhan211 deleted the duckdb-struct branch August 15, 2024 13:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support struct data definition like duckdb
5 participants